Irreversible Monte Carlo Algorithms for Efficient Sampling 



Konstantin S. Turitsyn*"'^'''), Michael Chertkov Marija Vucelja^''''') 
" James Frank Institute, University of Chicago, Chicago, IL 60615, USA 
^ Center for Nonlinear Studies & Theoretical Division, LANL, Los Alamos, NM 87545, USA 
Landau Institute for Theoretical Physics, Moscow 142432, Russia 
Department of Physics of Complex Systems, Weizmann Institute of Sciences, Rehovot 76100, Israel 

Equilibrium systems evolve according to Detailed Balance (DB). This principe guided development of the 
Monte-Carlo sampling techniques, of which Metropolis-Hastings (MH) algorithm is the famous representative. 
It is also known that DB is sufficient but not necessary. We construct irreversible deformation of a given 
reversible algorithm capable of dramatic improvement of sampling from known distribution. Our transformation 
modifies transition rates keeping the structure of transitions intact. To illustrate the general scheme we design 
an Irreversible version of Metropolis-Hastings (IMH) and test it on example of a spin cluster. Standard MH for 
the model suffers from the critical slowdown, while IMH is free from critical slowdown. 
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Recent decades have been marked by fruitful interaction be- 

' tween physics and computer science, with one of the most 
striking examples of such interaction going back to 40s when 

' physicists proposed a Markov Chain Monte Carlo (MCMC) 
algorithm lil |2[| . MCMC evaluates large sums, or integrals, 

[ approximately, in a sense imitating how nature would do effi- 
cient sampling itself. Development of this idea has became 

I wide spread and proliferated a great variety of disciplines. 
(See Ol |4|, |5[| for a sample set of reviews in physics and com- 

' puter science.) If one formally follows the letter of the orig- 
inal MCMC suggestion one ought to ensure that the Detailed 

] Balance (DB) condition is satisfied. This condition reflects 
microscopic reversibility of the underlying equilibrium dy- 

[ namics. A reader, impressed with indisputable success of the 
reversible MCMC techniques, may still wonder if the equilib- 

[ rium dynamics is the most efficient strategy for sampling and 
evaluating the integrals? In this letter we argue that typically 

[ the answer is NO. Let us try to illustrate the ideas on a sim- 
ple everyday life example. Consider mixing sugar in a cup 
of coffee, which is similar to sampling, as long as the sugar 
particles have to explore the entire interior of the cup. DB 
dynamics corresponds to diffusion taking an enormous mix- 
ing time. This is certainly not the best way to mix. More- 
over, our everyday experience suggests a better solution - 
enhance mixing with a spoon. Spoon steering generates an 
out-of-equilibrium external flow which significantly acceler- 
ates mixing, while achieving the same final result - uniform 
distribution of sugar concentration over the cup. In this letter 

1 we show constructively, with a practical algorithm suggested, 
that similar strategy can be used to decrease mixing time of 
any known reversible MCMC algorithms. 

There are two main obstacles which prevent fast mixing by 
traditional MCMC methods. First, the effective energy land- 
scape can have high barriers, separating the energy minima. In 
this case mixing time is dominated by rare processes of over- 
coming the barriers. Second, slow mixing can originate from 
the high entropy of the states basin (too many comparably im- 
portant states) providing major contribution to the system par- 
tition function. In the later case mixing time is determined by 



the number of steps it takes for reversible (diffusive) random 
walk to explore all the relevant states. 

MCMC algorithms are best described on discrete example. 
Consider a graph with vertices i — 1, . . . ,Af each labeling 
a state of the system and edges {i — > j) corresponding to 
"allowed" transitions between the states. For instance, an A^- 
dimensional hypercube corresponds to a system of N spins 
(with Af ^ 2^ states) with single-spin flips allowed. An 
MCMC algorithm can be described in terms of the transition 
matrix Ty representing the probability of a single MCMC step 
from state j to state i. Probability of finding the system in 
state i at time t, Pf, evolves according to the following Mas- 
ter Equation (ME); P-^^ = Y.- T,jP*. Stationary solution of 
ME, Pf — TTi, satisfies the Balance Condition (BC): 

j 

Qij = TijTTj from the Ihs of Eq. ([T]i can be interpreted as the 
stationary probability flux from state i to state j. Obviously, 
stationarity of the probability flow reads as the condition for 
incoming and outgoing fluxes at any state to sum up to zero. 
Note also, that Eq. ([TJ is nothing but incompressibility condi- 
tion of the stationary probability flow. 

The DB used in traditional MCMC algorithms is a more 
stringent condition, as it requires the piecewise balance of 
terms in the sum (H): for any pair of states with allowed tran- 
sitions one requires, T^TTj = TjiTTj. The main reason for 
DB to be so often used in practice originates from its tremen- 
dous simplicity. Otherwise, DB-consistent schemes constitute 
only a small subset of all other MCMC schemes convergent 
to the same stationary distribution tt. From the hydrodynamic 
point of view reversible MCMC corresponds to irrotational 
probability flows, while irreversibility relates to nonzero ro- 
tational part, e.g. correspondent to vortices contained in the 
flow. Putting it formally, in the irreversible case antisym- 
metric part of the ergodic flow matrix is nonzero and it ac- 
tually allows the following cycle decomposition, Qij — Qji = 
J2a Jai^ij ^ ^fi)^ whcrc indcx a enumerates cycles on the 
graph of states with the adjacency matrices C,". Then, Jq, 



stands for the magnitude of the probability flux flowing over 
cycle a. 

This cycle representation suggests how an irreversible 
MCMC algorithms can be constructed. One simply adds cy- 
cles to a given reversible Markov Chain, ensuring that the 
transition probabilities remain positive. In some simple ex- 
amples this straightforward approach can be rather efficient. 
Consider, for example, the problem of sampling uniform dis- 
tribution on a two-dimensional lattice, with the characteristic 
size L ^ 1. Reversible Monte Carlo algorithms would re- 
quire approximately T ^ steps to converge. On the other 
hand, imposing a kind of super-lattice of size / cycles, where 
1 ^ Z <C L, one observes that typical mixing time is now 
determined by an interplay of usual and "turbulent" diffusion 
respectively. One estimates, T ^ P + L"^ /I, where the two 
terms correspond to dynamics on small (sub vortex-size) and 
large scales. The minimal value of T is achieved at I ^ L^/'^, 
thus resulting in T ~ L^/^. Moreover, one can reduce T even 
further to T ~ L with the help of an additional lifting opera- 
tion, see e.g. |0] for related discussion. The important lesson 
we draw from this example is that knowing the state space and 
carefully planting irreversible cycles one can indeed achieve a 
significant acceleration of mixing. 

However promising it looks, the cycle-based procedure has 
two serious caveats making it difficult to implement. First, 
the number of states in majority of interesting problems is (at 
least) exponential in the number of physical degrees of free- 
dom. (For example an Ising system consisting of N spins has 
2^ states.) Thus, one expects that the number of cycles suf- 
ficient for essential mixing improvement is also exponential, 
i.e. too large of a number for algorithmically feasible imple- 
mentation. Second, not all cycles are equally desirable, mak- 
ing optimization over cycle placements to be a task of even 
higher algorithmic complexity. 

Therefore, aiming to achieve practical and flexible im- 
plementation, we have to abandon the cycle-based idea and 
instead focus on a better alternative - building irreversible 
MCMC algorithms via controlled deformation of an existent 
reversible MCMC. To be more specific, in the remainder of 
this Letter we adopt and develop replication/lifting trick dis- 
cussed in 161 |7D . The main idea behind our strategy is as fol- 
lows. Instead of planting into the system an irreversible proba- 
bility flux, correspondent to an "incompressible" BC, we add 
a mixing desirable "compressible" flux, and compensate for 
its compressibility by building an additional replica with re- 
versed flux and allowing some inter-replica transitions. To 
enforce BC one tunes the replica switching probabilities com- 
puted "on the fly" (and locally). The replication idea is illus- 
trated in Fig. 1. Acknowledging generality of the setting, we 
focus in this Letter on explaining one relatively simple imple- 
mentation of this idea. Generalization and modifications of 
the procedure will be analyzed and discussed elsewhere. 

Consider reversible MCMC algorithm characterized by the 
transition matrix Tij which (a) obeys the DB condition, and 
(b) converges to the equilibrium distribution tt^. Assume that 
each state has duplicates in two replicas, marked by ±. Fol- 
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FIG. 1: Schematic representation of the replication deformation. 
Dashed lines represent replica switching transitions, which compen- 
sate for compressibility of the probability flows associated with solid 
lines. 

lowing some local rule (an example will be provided below) 
one introduces a split between states within each of the repli- 
cas, = T^^'^ + tI~\ such that all T^p^ are positive and 

satisfy, Vi 7^ j, T-^^Trj = to be cafled the skew DB 

condition. The total transition matrix, 

also contains nonzero and positive (as probabilities) inter- 
replica terms, A^^'^\ allowing transitions only between 
two replicas of the same state. One tunes the inter- 
replica terms to ensure convergence to the given steady dis- 
tribution, TTi. This is achieved by choosing, a£*'^^ = 
^^ax^^Q,J2jTit^ -T^P^ > 0, and the diagonal terms 
Tl/j^'' are fixed according to the stochasticity condition: 

T^t^ = 1 - Tj^^ - ^''il'^^- This description completes 
our construction of an irreversible MCMC algorithm from a 
given reversible one. Note that this construction is not unique, 
and in general multiple choices of A^^^'^^ are possible. The 
proposed scheme is illustrated below on example of a simple 
spin system, with the Metropolis-Hastings (MH)-Glauber al- 
gorithm chosen as the respective reversible prototype. 

MH ill is the most popular reversible MCMC algorithm. 
MH-transition from a current state i is defined in two steps. 
(A) A new state j is selected randomly. (B) The proposed 
state is accepted with probability pacc = min(l, TTj/ni) or re- 
jected with the probability 1 —pacc respectively. Selecting the 
proposed state i.i.d. randomly from all possible single spin 
flips corresponds to the Glauber dynamics popular in simula- 
tions of spin systems. Let us now explain how to build an ir- 
reversible MCMC algorithm for spin systems based on the re- 
versible MH-Glauber algorithm. One considers separation in 
two replicas according to the sign value, + or — , of the spin to 
be flipped. Then, our irreversible MH-Glauber scheme works 
as follows. Spin a is selected i.i.d. randomly from the pool of 
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FIG. 2: Correlation time of the total spin de-correlation in the spin 
cluster model. Dots correspond to the direct diagonalization of the 
transition matrices. Crosses are correlation times found from re- 
spective MCMC simulations. Blue and red colors correspond to re- 
versible and irreversible algorithms respectively. Best fitting slopes 



are given by Tre. 



and Ti, 



all other spins of the system having + or — values, depending 
on the sign of the replica where the system stays. The selected 
spin is flipped with the probability pacc = min(l, Trj/iri), in 
which case the system stays in the same replica. If the flip 
is not accepted the state is switched to its counterpart of the 



other repUca with probabiHty A,^+' 7(1 - TjfO. (These 
transitions are indicated as dash lines in Fig.[T]) Note, that in 
the case of the Glauber dynamics both A\f'^^ and J2j "^ji^^ 
are local quantities depending only on the current state of the 
system, and calculating transition probabilities constitutes an 
insignificant computational overhead. 

We choose iV-spins ferromagnetic cluster (equal strength 
interaction between all the spins) as a testbed and discuss 
sampling from respective stationary distribution, tTj^ s„ ~ 



exp 



(J/2iV)E 



k.k- 



SfcSfe' 



Note, that a state of the sim- 
ple system is completely characterized by its global spin, 
S = J2k ^k, and respective probability distribution, P{S) ~ 
jy^^jy , exp [JS^/{2N)], where N± = {N ± S)/2 is the 
number of positive/negative spins. Considered in the ther- 
modynamic limit, iV ^ oo, the system undergoes a phase 
transition at J = 1. Away from the transition in the param- 
agnetic phase, ,/ < 1, P{S) is centered around 5 = and 
the width of the distribution is estimated by 6S ^ ^^N/J, 
which changes to SS ^ N^/^ at the critical point J = 1. 
One important consequence of the distribution broadening is 
a slowdown observed at the critical point for reversible MH- 
Glauber sampling. Then characteristic correlation time of S 
(measured in the number of Markov chain steps) is estimated 
as Trev ~ ('^•S')^, and the computational overhead associated 
with the critical slowdown is ~ ^/N. We brought this sim- 
ple model to illustrate advantage of using irreversibility. As 
shown below, the irreversible modification of the MH-Glauber 
algorithm applied to the spin cluster problem achieves com- 
plete removal of the critical slowdown. To estimate the corre- 



lation time in the irreversible case we first note that, switching 
from one replica to another the system always go through the 
5 = state. ( This follows directly from the observation that 
A,|,^'~'' = for the states with 5 > and A,|~'^'' = for the 
states with 5 < 0.) The Markovian nature of the algorithm 
implies that all the trajectories connecting two consequent 
5 = 0-swipes are statistically independent, and therefore cor- 
relation time is roughly equal to the typical number of steps in 
each of these trajectories. Recalling that inside a replica (i.e. 
in between two consecutive swipes) dynamics of 5 is strictly 
monotonous, one estimates T!;,.,, ~ 5S. This estimate suggests 
a significant acceleration: Ti^r ^ -sJTrev T,.ev Note, that 
one expects to observe significant acceleration even outside of 
the critical domain, for larger and smaller values of J. 

We verified the correlation time estimation via numerical 
tests. Implementing reversible and irreversible versions of 
the MH-Glauber algorithm we, first, analyzed decay of the 
pair correlation function, (5(0)5(i)), with time. Respective 
correlation time was reconstructed by fitting the large time 
asymptotics with exponential function, cxp{—t/Trev), and 
exponential-oscillatory function, exp(— t/T^rr) cos{ujt — ip), 
in the reversible and irreversible cases respectively. Sec- 
ond, for both MH and IMH algorithms we constructed tran- 
sition matrix corresponding to the random walk in 5, cal- 
culated spectral gap. A, related to the correlation time as, 
T = 1/ReA. In both tests we analyzed critical point J = 1 
and used different values of N ranging from 16 to 4096. 
Simulation results are shown in Fig. |2] The results found 
for two settings are consistent with each other Numerical 
values (Trev ^ N^-'^^ and T,rr ^ N°-^^) are also in a 
reasonable agreement with respective theoretical predictions 
(Trev ~ iV^/^ and T„r N^^*) while a sHght discrep- 
ancy can be attributed to finite size effects. Note, that in the 
irreversible case correlation time of the global spin correla- 
tion function (number of respective MC steps) grows with 
the number of spins, N, but does it slower than linearly. In 
other words, mixing becomes so efficient that equilibration of 
the global spin correlations is observed even before all spins 
of the systems are flipped. One concludes, that performance 
of the irreversible scheme is at least as good as the one of 
the cluster algorithms [l^l tested on the spin cluster model 
I 111 [lit] . (We note, however, that direct comparison of the two 
algorithms is not straightforward, as the cluster algorithm flips 
many spins at once and therefore its convergence is normally 
stated in renormalized units.) 

Let us now discuss relation of the proposed algorithm to 
previous studies. Although potential power of algorithms with 
broken DB has been realized for already a while, only handful 
of irreversible examples have been proposed so far. One of the 
examples is the sequential updating algorithm lIlBIl designed 
to simulate two-dimensional Ising system. In the essence, 
the algorithm consists of a number of subsystems (replicas) 
with internal dynamics, each characterized by its own transi- 
tion matrix. In a great contrast with our algorithm, the system 
switches between replicas in a predefined deterministic fash- 
ion. Similar idea of breaking DB by switching irreversibly 
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but periodically between reversible portions was implemented 
in the successive over-relaxation algorithm of il4ll . Impor- 
tant samp ling algorithm with DB broken is the Hybrid Monte 
Carlo of 11511 . where Hamiltonian dynamics is used to accel- 
erate sampling. Once again the story here relates to replicas, 
each parameterized by distinct momentum, with switches be- 
tween the replicas controlled deterministic ally by the under- 
lying Hamiltonian. It is also appropriate to cite relevant ef- 
forts ori gina ted in statistics 101, mathematics and computer 
science lll7ll . Several simple examples of irreversible algo- 
rithms were discussed and analyzed in 101. iSl showed that 
improvement in mixing, provided by a multi-replica lifting, 
does not allow reduction stronger than the one observed in the 
diffusive-to-ballistic scenario, T — > VT, where T is the mix- 
ing time of the underlying reversible algorithm. The grain of 
salt here is that the acceleration was achieved via a replication 
of an extremely high, ~ k^, degree where k is the number of 
states. ITtIi showed that some complementary ideas, from the 
field of distributed networks, allows to reduce this replication 
scaling a bit. 

We also find it useful to briefly discuss reversible algo- 
rithms showing certain similarity to the algorithm and ideas 
of the Letter. First of all, it is important to mention again 
cluster algorithms [13, [3l which were most successful in 
biting the odds of the critical slowdown in the regular sys- 
tems of the Ising type. The trick here is to explore duality 
of the model, which allows two alternative representations re- 
lated to each other via a state-non-local transformation. The 
cluster algorithm switches between two dual representations, 
thus realizing long jumps in the phase space. Note that such 
jumps would be forbidden by a phase-space local dynamics in 
either of the two representations. Best algorithms of the clus- 
ter type achieves very impressive rate of convergence. The 
downside is in the fact that the cluster algorithms are model 
specific and rather difficult in implementation because of ex- 
treme phase space non-locality of the steps. Worm algorithm 
of iflill allows essential reduction in the critical slowdown via 
mapping to a high-temperature-inspired loop representation 
and making local moves there. The last but not the least, we 
mention the simulated annealing algorithm of lfl9ll built on a 
temperature-graded replication consistent with DB. One inter- 
esting direction for future research is to explore if (and under 
which conditions) additional irreversibility can improve al- 
ready good mixing performance provided within each of these 
reversible algorithms. 

To summarize, this letter describes how to upgrade a re- 
versible MC into an irreversible MC converging to the same 
distribution faster. To prove the concept we design a spin- 
problem specific irreversible algorithm, and tested it on the 
mean-field spin-cluster model. We showed on this exam- 
ple that the irreversible modification can lead to dramatic ac- 
celeration of MC mixing. Our results suggest that the irre- 



versible MC algorithms are especially beneficial for accel- 
eration of mixing in systems containing multiple soft and 
zero modes, however unaccessible for standard (reversible) 
schemes. This situation occurs typically in systems experi- 
encing critical slowdown in the vicinity of a phase transition, 
and it is also an inherent property of systems possessing in- 
ternal symmetries of high degree. Entropic degeneracy is the 
main factor limiting the convergence of regular MCMC algo- 
rithm in these problems. To conclude, the ideas discussed in 
the letter might be useful in studies of phase transitions, soft 
matter dynamics, protein structures and granular media. 
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